Built with R 4.2.2
The basic premise behind any form of redistribution is that a given set of observation can be translated to some superset or subset.
This means the basic requirements for redistribution are:
- a
sourcewith observations - a
targetwithout observations - a
mapbetweensourceandtargetentities
The map associates a given source ID to a given target ID. If one source ID maps onto many target IDs, then we are disaggregating the source observation (translating one observation to many). If each source ID maps into a single target ID (and potentially many source IDs are mapping onto that target ID), then we are aggregating the source observations (translating many observations to one).
Basic Example
Say we have 1 set of 5 regions:
regions <- data.frame(id = 1:5)
regions
#> id
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5Disaggregate
If we have a single observation for the entire set, we could disaggregate it to the regions. With no additional information, our best guess for the value of each region is a proportional split – in this case, one fifth of the observed value for each region:
# install if needed: remotes::install_github("uva-bi-sdad/redistribute")
library(redistribute)
set_value <- 1
(redistribute(set_value, regions, verbose = TRUE))
#> ℹ source IDs: 1
#> ℹ target IDs: id column of `target`
#> ℹ map: all target IDs for single source
#> ℹ weights: 1
#> ℹ redistributing 1 variable from 1 source to 5 targets:
#> • (numb; 1) V1
#> ℹ disaggregating...
#> ✔ done disaggregating [9ms]
#>
#> id V1
#> 1 1 0.2
#> 2 2 0.2
#> 3 3 0.2
#> 4 4 0.2
#> 5 5 0.2Or, maybe we know a little more about the regions, such as their size; then we could use that information to adjust the proportion allotted to each region:
region_values <- redistribute(
set_value, regions,
weight = c(1, 10, 10, 20, 50), verbose = TRUE
)
#> ℹ source IDs: 1
#> ℹ target IDs: id column of `target`
#> ℹ map: all target IDs for single source
#> ℹ weights: `weight` vector
#> ℹ redistributing 1 variable from 1 source to 5 targets:
#> • (numb; 1) V1
#> ℹ disaggregating...
#> ✔ done disaggregating [10ms]
#>
region_values
#> id V1
#> 1 1 0.01098901
#> 2 2 0.10989011
#> 3 3 0.10989011
#> 4 4 0.21978022
#> 5 5 0.54945055Aggregate
If we have observations for all regions, we could aggregate to a single value for the set. In this case, we can re-aggregate what we initially disaggregated, to recover the original observation:
(redistribute(region_values, verbose = TRUE))
#> ℹ source IDs: id column of `source`
#> ℹ target IDs: 1
#> ℹ map: all source IDs to a single target
#> ℹ weights: 1
#> ℹ redistributing 1 variable from 5 sources to 1 target:
#> • (numb; 1) V1
#> ℹ aggregating...
#> ✔ done aggregating [5ms]
#>
#> id V1
#> 1 1 1Applied Example
One use case for redistribution is converting demographics data between geographic layers.
For illustration, we can look at U.S. Census data in Fairfax, Virginia:
base_dir <- "~/Downloads"
# remotes::install_github("uva-bi-sdad/catchment")
library(catchment)
library(sf)
# download population and data shapes
population <- download_census_population(base_dir, "VA", 2020)$estimates
#> ℹ loading existing Virginia population data
population <- population[grepl("^51(?:059|600)", population$GEOID), ]
population[, -1] <- vapply(
population[, -1], as.numeric, numeric(nrow(population))
)
rownames(population) <- population$GEOID
block_groups <- st_transform(
download_census_shapes(base_dir, "VA", "bg", year = 2020), "WGS84"
)
block_groups <- block_groups[block_groups$GEOID %in% population$GEOID, ]
population <- population[block_groups$GEOID, ]
st_geometry(population) <- block_groups$geometry
population$Overall <- population$TOTAL.POPULATION_Total
# prepare a map
library(leaflet)
pal <- colorNumeric(scico::scico(255, palette = "lajolla"), population$Overall)
map <- leaflet(
population,
options = leafletOptions(attributionControl = FALSE)
) |>
addProviderTiles("CartoDB.Positron") |>
addScaleBar("bottomleft") |>
addControl("Total Population", "topright") |>
addLayersControl(
position = "topleft", overlayGroups = "Block Groups",
baseGroups = c(
"Zip Codes", "Parcels -> Zip Codes", "Block Groups -> Zip Codes",
"Block Groups -> Parcels -> Zip Codes"
)
) |>
addLegend(
"bottomright", pal, ~Overall,
title = "Block Groups", opacity = 1
) |>
addPolygons(
fillColor = ~ pal(Overall), fillOpacity = 1, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"),
group = "Block Groups", label = ~ paste(
GEOID, "Population:", round(Overall, 3)
)
)This population information is provided at the Census Block Group level at the lowest, but we might want to look at population within zip codes.
We could try aggregating directly from block groups to zip codes, which involves calculating proportional intersections between region polygons:
# Download shapes if needed
zipcode_file <- paste0(base_dir, "/zipcode_va_fairfax.geojson")
if (!file.exists(zipcode_file)) {
download.file(paste0(
"https://www.fairfaxcounty.gov/euclid/rest/services/IPLS/IPLSMap",
"/MapServer/3/query?where=1=1&outFields=*&outSR=4326&f=geojson"
), zipcode_file)
}
zipcodes <- read_sf(zipcode_file)
# redistribute population data from block groups
zipcode_population <- redistribute(
population, zipcodes,
target_id = "ZIPCODE", verbose = TRUE
)
#> ℹ source IDs: GEOID column of `source`
#> ℹ target IDs: ZIPCODE column of `target`
#> ℹ map: intersections between geometries
#> ℹ mapping...
#> ✔ done mapping [5.8s]
#>
#> ℹ weights: 1
#> ℹ redistributing 116 variables from 693 sources to 80 targets:
#> • (numb; 116)
#> ℹ aggregating...
#> ✔ done aggregating [7ms]
#> We could also disaggregate to the parcel level, which are point locations, then aggregate to zipcodes:
# Download shapes if needed
## https://data-fairfaxcountygis.opendata.arcgis.com/datasets/current-population
parcel_file <- paste0(base_dir, "/parcel_va_fairfax.geojson")
if (!file.exists(parcel_file)) {
download.file(paste0(
"https://opendata.arcgis.com/api/v3/datasets/",
"314bfe4019754952a715be3a33384d9d_0/downloads/data",
"?format=geojson&spatialRefId=4326&where=1=1"
), parcel_file)
}
parcels <- read_sf(parcel_file)
# disaggregate population data from block groups to parcels
bg_parcel_population <- redistribute(
population, parcels,
weight = "CURRE_POPUL", verbose = TRUE
)
#> ℹ source IDs: GEOID column of `source`
#> ℹ target IDs: sequence, assuming map from geometries
#> ℹ map: intersections between geometries
#> ℹ mapping...
#> ✔ done mapping [1.9s]
#>
#> ℹ weights: CURRE_POPUL column of `target`
#> ℹ some `target_id`s were dropped because they were not present in `map`
#> ℹ redistributing 116 variables from 693 sources to 336375 targets:
#> • (numb; 116)
#> ℹ disaggregating...
#> ✔ done disaggregating [2.9s]
#>
#> ℹ realigning with original target IDs
# then aggregate from parcels to zip codes
bg_parcel_zipcode_population <- redistribute(
bg_parcel_population, zipcodes,
source_id = "id", target_id = "ZIPCODE", verbose = TRUE
)
#> ℹ source IDs: id column of `source`
#> ℹ target IDs: ZIPCODE column of `target`
#> ℹ map: intersections between geometries
#> ℹ mapping...
#> ✔ done mapping [3.1s]
#>
#> ℹ weights: 1
#> ℹ some `target_id`s were dropped because they were not present in `map`
#> ℹ redistributing 116 variables from 336407 sources to 55 targets:
#> • (numb; 116)
#> ℹ aggregating...
#> ✔ done aggregating [7.7s]
#>
#> ℹ realigning with original target IDs
# since it's provided in this case, we can also just aggregate
# up from parcels directly
parcel_zipcode_population <- redistribute(
parcels, zipcodes,
source_id = "PIN", target_id = "ZIPCODE", verbose = TRUE
)
#> ℹ source IDs: PIN column of `source`
#> ℹ target IDs: ZIPCODE column of `target`
#> ℹ map: intersections between geometries
#> ℹ mapping...
#> ✔ done mapping [2.9s]
#>
#> ℹ weights: 1
#> ℹ some `target_id`s were dropped because they were not present in `map`
#> ℹ redistributing 7 variables from 336407 sources to 55 targets:
#> • (numb; 5) OBJECTID, PARCE_ID, CURRE_POPUL, LOW_ESTIM_POPUL, HIGH_ESTIM_POPUL
#> • (char; 2) VALID_FROM, VALID_TO
#> ℹ aggregating...
#> ✔ done aggregating [6.7s]
#>
#> ℹ re-converting categorical levels
#> ℹ realigning with original target IDsNow we can compare the estimated total population of these different methods with those provided:
all_values <- c(
zipcodes$POPULATION, zipcode_population$Overall,
bg_parcel_zipcode_population$Overall, parcel_zipcode_population$CURRE_POPUL
)
pal_zip <- colorNumeric(scico::scico(255, palette = "lajolla"), all_values)
map |>
addPolygons(
data = zipcodes,
fillColor = ~ pal_zip(POPULATION), fillOpacity = .8, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"),
group = "Zip Codes", label = ~ paste(
ZIPCODE, "Population:", round(POPULATION, 3)
)
) |>
hideGroup("Zip Codes") |>
addPolygons(
data = parcel_zipcode_population,
fillColor = ~ pal_zip(CURRE_POPUL), fillOpacity = .8, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"),
group = "Parcels -> Zip Codes", label = ~ paste(
id, "Population:", round(CURRE_POPUL, 3)
)
) |>
hideGroup("Parcels -> Zip Codes") |>
addPolygons(
data = zipcode_population,
fillColor = ~ pal_zip(Overall), fillOpacity = .8, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"),
group = "Block Groups -> Zip Codes", label = ~ paste(
id, "Population:", round(Overall, 3)
)
) |>
hideGroup("Block Groups -> Zip Codes") |>
addPolygons(
data = bg_parcel_zipcode_population,
fillColor = ~ pal_zip(Overall), fillOpacity = .8, weight = 1,
color = "#000", highlightOptions = highlightOptions(color = "#fff"),
group = "Block Groups -> Parcels -> Zip Codes", label = ~ paste(
id, "Population:", round(Overall, 3)
)
) |>
showGroup("Block Groups -> Parcels -> Zip Codes") |>
addLegend("bottomright", pal_zip, all_values, opacity = 1, title = "Zip Codes")